Identify Temporal Websites Based on User Behavior Analysis
نویسندگان
چکیده
The web is growing at a rapid speed and it is almost impossible for a web crawler to download all new pages. Pages reporting breaking news should be stored into search engine index as soon as they are published, while others whose content is not time-related can be left for later crawls. We collected and analyzed into users’ page-view data of 75,112,357 pages for 60 days. Using this data, we found that a large proportion of temporal pages are published by a small number of web sites providing news services, which should be crawled repeatedly with small intervals. Such temporal web sites of high freshness requirements can be identified by our algorithm based on user behavior analysis in page view data. 51.6% of all temporal pages can be picked up with a small overhead of untemporal pages. With this method, web crawlers can focus on these web sites and download pages from them with high priority.
منابع مشابه
Analysis of User query refinement behavior based on semantic features: user log analysis of Ganj database (IranDoc)
Background and Aim: Information systems cannot be well designed or developed without a clear understanding of needs of users, manner of their information seeking and evaluating. This research has been designed to analyze the Ganj (Iranian research institute of science and technology database) users’ query refinement behaviors via log analysis. Methods: The method of this research is log anal...
متن کاملA New Trust Model for B2C E-Commerce Based on 3D User Interfaces
Lack of trust is one of the key bottle necks in e-commerce development. Nowadays many advanced technologies are trying to address the trust issues in e-commerce. One among them suggests using suitable user interfaces. This paper investigates the functionality and capabilities of 3D graphical user interfaces in regard to trust building in the customers of next generation of B2C e-commerce websit...
متن کاملارزیابی کیفیت وب سایتهای فارسی حوزه افسردگی براساس مقیاس وب مد کوال
Introduction: Nowadays, anyone with any knowledge of the Internet environment can act as producer and distributer of information. It differs from most traditional media of information transmission, lack of information control and lack of quality management to contents. This leads to quality of health information on the internet is doubtful. The objective of this study is guidance patients to ...
متن کاملTAM2-based Study of Website User Behavior—Using Web 2.0 Websites as an Example
In recent years, we have seen a return of web-based applications built with new ideas and new commercial models. The key momentum for the development of such applications is the Web 2.0 technology. Web 2.0 websites are dynamic and characterized by user interaction, sharing, and participation. The emergence of this new business model brings new business opportunities. In fact, website users are ...
متن کاملAnalysis of Usage Patterns in Large Multimedia Websites
User behavior in a website is a critical indicator of the web site’s usability and success. Therefore an understanding of usage patterns is essential to website design optimization. In this context, large multimedia websites pose a significant challenge for comprehension of the complex and diverse user behaviors they sustain. This is due to the complexity of analyzing and understanding user-dat...
متن کامل